Goto

Collaborating Authors

 gesture recognition


A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm

Baghernezhad, Soroush, Mohammadreza, Elaheh, da Fonseca, Vinicius Prado, Zou, Ting, Jiang, Xianta

arXiv.org Artificial Intelligence

Gestures are an integral part of our daily interactions with the environment. Hand gesture recognition (HGR) is the process of interpreting human intent through various input modalities, such as visual data (images and videos) and bio-signals. Bio-signals are widely used in HGR due to their ability to be captured non-invasively via sensors placed on the arm. Among these, surface electromyography (sEMG), which measures the electrical activity of muscles, is the most extensively studied modality. However, less-explored alternatives such as inertial measurement units (IMUs) can provide complementary information on subtle muscle movements, which makes them valuable for gesture recognition. In this study, we investigate the potential of using IMU signals from different muscle groups to capture user intent. Our results demonstrate that IMU signals contain sufficient information to serve as the sole input sensor for static gesture recognition. Moreover, we compare different muscle groups and check the quality of pattern recognition on individual muscle groups. We further found that tendon-induced micro-movement captured by IMUs is a major contributor to static gesture recognition. We believe that leveraging muscle micro-movement information can enhance the usability of prosthetic arms for amputees. This approach also offers new possibilities for hand gesture recognition in fields such as robotics, teleoperation, sign language interpretation, and beyond.


ReactEMG: Stable, Low-Latency Intent Detection from sEMG via Masked Modeling

Wang, Runsheng, Zhu, Xinyue, Chen, Ava, Xu, Jingxi, Winterbottom, Lauren, Nilsen, Dawn M., Stein, Joel, Ciocarlie, Matei

arXiv.org Artificial Intelligence

Surface electromyography (sEMG) signals show promise for effective human-machine interfaces, particularly in rehabilitation and prosthetics. However, challenges remain in developing systems that respond quickly to user intent, produce stable flicker-free output suitable for device control, and work across different subjects without time-consuming calibration. In this work, we propose a framework for EMG-based intent detection that addresses these challenges. We cast intent detection as per-timestep segmentation of continuous sEMG streams, assigning labels as gestures unfold in real time. We introduce a masked modeling training strategy that aligns muscle activations with their corresponding user intents, enabling rapid onset detection and stable tracking of ongoing gestures. In evaluations against baseline methods, using metrics that capture accuracy, latency and stability for device control, our approach achieves state-of-the-art performance in zero-shot conditions. These results demonstrate its potential for wearable robotics and next-generation prosthetic systems. Our project website, video, code, and dataset are available at: https://reactemg.github.io/






SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition

Liu, Chen, Han, Can, Xu, Weishi, Wang, Yaqi, Qian, Dahong

arXiv.org Artificial Intelligence

Abstract-- Surface electromyography (sEMG)-based gesture recognition plays a critical role in human-machine interaction (HMI), particularly for rehabilitation and prosthetic control. However, sEMG-based systems often suffer from the scarcity of informative training data, leading to overfitting and poor generalization in deep learning models. Data augmentation offers a promising approach to increasing the size and diversity of training data, where faithfulness and diversity are two critical factors to effectiveness. However, promoting untargeted diversity can result in redundant samples with limited utility. To address these challenges, we propose a novel diffusion-based data augmentation approach, Sparse-Aware Semantic-Guided Diffusion Augmentation (SASG-DA). To enhance generation faithfulness, we introduce the Semantic Representation Guidance (SRG) mechanism by leveraging fine-grained, task-aware semantic representations as generation conditions. To enable flexible and diverse sample generation, we propose a Gaussian Modeling Semantic Sampling (GMSS) strategy, which models the semantic representation distribution and allows stochastic sampling to produce both faithful and diverse samples. To enhance targeted diversity, we further introduce a Sparse-Aware Semantic Sampling (SASS) strategy to explicitly explore underrepresented regions, improving distribution coverage and sample utility. Extensive experiments on benchmark sEMG datasets, Ninapro DB2, DB4, and DB7, demonstrate that SASG-DA significantly outperforms existing augmentation methods. Overall, our proposed data augmentation approach effectively mitigates overfitting and improves recognition performance and generalization by offering both faithful and diverse samples. Esture recognition serves as a fundamental technology for advancing human-machine interaction. Among various gesture recognition modalities, surface electromyography (sEMG)-based approaches have gained increasing attention due to their non-invasive nature, high temporal resolution, and ability to directly capture muscle activation signals associated with voluntary movement [1].


Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

Zhang, Xijie, He, Fengliang, Dai, Hong-Ning

arXiv.org Artificial Intelligence

Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs' strengths, it is non-trivial to achieve few-shot and zero-shot learning of CIR gestures due to their inconspicuous features. To tackle this challenge, we collect differential CIR rather than original CIR data. Moreover, we construct a real-world dataset collected from 10 participants performing 15 gestures across three categories (digits, letters, and shapes), with 10 repetitions each. We then conduct extensive experiments on this dataset using an LLM-adopted classifier. Results show that our LLM-based framework achieves accuracy comparable to classical machine learning baselines, while requiring no domain-specific retraining.


Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks

Akremi, Mohamed Sanim, Slama, Rim, Tabia, Hedi

arXiv.org Artificial Intelligence

Human activity recognition is an important research topic in pattern recognition field. It has been the subject of many studies in the past two decades because of its importance in numerous areas such as security, health, daily activity, energy consumption and robotics. Recently, some works on the recognition of hand gestures or human actions from skeletal data are based on the modeling of the skeleton's movement as manifold-based representation and proposed deep neural networks on this structure [1, 2, 3]. These approaches demonstrated their potential in the processing of skeletal data. Most of them are applied on offline human action recognition which is useful in time-limited tasks. However, in many applications, simply recognizing a single gesture in a given segmented sequence is not enough, especially in monitoring systems and virtual-reality devices which need to detect human movements moment by moment in continuous videos. In these online recognition systems, it is important to detect the existence of an action as early as possible after its beginning. It is also essential to determine the nature of the movement within a sequence of frames, without having information about the number of gestures present within the video, their starting times or their durations, unlike the segmented action recognition. In this paper, we propose to use a manifold-based model in order to build an online motion recognition system that detects and identifies different human activities in unsegmented skeletal sequences.


Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras

Øhrstrøm, Christoffer Koo, Güldenring, Ronja, Nalpantidis, Lazaros

arXiv.org Artificial Intelligence

W e propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is to discover an event representation that preserves these properties. Prior works have represented events as frames or as voxels. However, while these representations yield high accuracy, both frames and voxels are synchronous and decrease the spatial sparsity. Spiking Patches gives the means to preserve the unique properties of event cameras and we show in our experiments that this comes without sacrificing accuracy. W e evaluate our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and object detection. T okens from Spiking Patches yield inference times that are up to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. W e achieve this while matching their accuracy and even surpassing in some cases with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for object detection. Thus, tokenization constitutes a novel direction in event-based vision and marks a step towards methods that preserve the properties of event cameras.